Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176
Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176caaavik-msft wants to merge 8 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional --run-in-docker execution mode to eng/skill-validator so Copilot CLI runs headlessly inside a Docker container, with host/container path translation and read-only skill mounts to reduce host impact during evaluations.
Changes:
- Introduce
DockerCopilotServerto build/run a Copilot CLI container, map work/skill paths, and exec setup commands in-container. - Wire Docker-mode path mapping into agent sessions and judges; add
--run-in-dockerconfig/flag and documentation. - Add Dockerfile to the build output and new unit tests for mount/path mapping logic.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| eng/skill-validator/src/Services/DockerCopilotServer.cs | Implements Docker image build/run, mounts, path translation, and container lifecycle management. |
| eng/skill-validator/src/Services/AgentRunner.cs | Uses Docker CLI URL when enabled; maps paths; runs setup commands via docker exec; maps permission paths back to host. |
| eng/skill-validator/src/Services/Judge.cs | Maps working directory to container path for judge sessions in Docker mode. |
| eng/skill-validator/src/Services/PairwiseJudge.cs | Maps working directory to container path for pairwise judge sessions in Docker mode. |
| eng/skill-validator/src/Commands/ValidateCommand.cs | Adds --run-in-docker, initializes Docker server before runs, and stops it during normal cleanup. |
| eng/skill-validator/src/Models/Models.cs | Adds RunInDocker to ValidatorConfig. |
| eng/skill-validator/src/SkillValidator.csproj | Copies Dockerfile into output so it’s available at runtime. |
| eng/skill-validator/src/Docker/Dockerfile | Builds a base image and installs the Copilot CLI binary from the SDK package. |
| eng/skill-validator/README.md | Documents Docker mode requirements and usage. |
| eng/skill-validator/tests/DockerCopilotServerTests.cs | Adds unit tests for mount computation and host/container path mapping. |
Comments suppressed due to low confidence (1)
eng/skill-validator/src/Commands/ValidateCommand.cs:169
- In Docker mode,
DockerCopilotServer.Initialize(...)happens before early-return paths in model validation (invalid model / exception). SinceGetSharedClient()can start the Docker container during model validation, returning early here can leak a running container and temp dirs. Wrap the body ofRunin a try/finally that always callsStopSharedClient(),dockerServer.StopAsync(), andCleanupWorkDirs()(and/or clearDockerCopilotServer.Instance) even on early exits.
// Set up DockerCopilotServer with skill directories to mount
if (config.RunInDocker)
DockerCopilotServer.Initialize(config.Verbose, allSkills);
// Validate model early
try
{
var client = await AgentRunner.GetSharedClient(config.Verbose);
var models = await client.ListModelsAsync();
var modelIds = models.Select(m => m.Id).ToList();
var modelsToValidate = new List<string> { config.Model };
if (config.JudgeModel != config.Model) modelsToValidate.Add(config.JudgeModel);
foreach (var m in modelsToValidate)
{
if (!modelIds.Contains(m))
{
Console.Error.WriteLine($"Invalid model: \"{m}\"\nAvailable models: {string.Join(", ", modelIds)}");
return 1;
}
}
Console.WriteLine($"Using model: {config.Model}" +
(config.JudgeModel != config.Model ? $", judge: {config.JudgeModel}" : "") +
$", judge-mode: {config.JudgeMode}");
}
catch (Exception error)
{
Console.Error.WriteLine($"Failed to validate model: {error}");
return 1;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!Path.IsPathRooted(relativePath) && | ||
| !relativePath.StartsWith(".." + Path.DirectorySeparatorChar, StringComparison.Ordinal) && | ||
| relativePath != "..") | ||
| { | ||
| return Path.Combine(containerMount, relativePath).Replace("\\", "/"); | ||
| } |
There was a problem hiding this comment.
NIT: This repeats and should be extracted to a helper function.
Also the separators normalization part might be needed regardles of path being rooted
There was a problem hiding this comment.
Made this change, although the non-rooted path returns an exception. If the relative path is rooted, it means that they are on different drives. This code just means that only paths that are mapped inside the container will return without exception.
| "run", | ||
| "--name", containerName, | ||
| "-p", $"0:{InternalPort}", // Map internal port to a random host port | ||
| "-e", "GITHUB_TOKEN", |
There was a problem hiding this comment.
This feels like anything within the container (e.g. misbehaving skill) can now use/flush the token.
cc: @ViktorHofer
There was a problem hiding this comment.
I just checked to see how the Copilot SDK uses GitHubToken and it looks like it just passes it in via an environment variable anyways https://github.com/github/copilot-sdk/blob/4e1499dd23709022c720eaaa5457d00bf0cb3977/dotnet/src/Client.cs#L1003-L1006. So I think it's not going to be practical to hide that. I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates. Also with full shell access, a misbehaving skill could always just run gh auth token to get a new token.
There was a problem hiding this comment.
There is a sample here actually showing a way to solve this properly, but it would make things too complicated: https://github.com/github/copilot-sdk/tree/main/test/scenarios/bundling/container-proxy
There was a problem hiding this comment.
I think Copilot CLI would be the only practical place to fix this by having it hide the env var from any child subprocesses it creates.
Damn, we should definitely fix this in the Copilot SDK. Would you mind filing an issue?
Overall, as long as we don't use this in CI where we have less control over the PAT, this is fine.
JanKrivanek
left a comment
There was a problem hiding this comment.
Looks good to me if it's only for local usage, not to be used in PR review pipeline (so opt-in is good here - as contributors cannot change it).
Some of this might be alleviated with the support for msbench that we are working on
|
I assume you now have maintainer permissions. Can you please submit this change again from a non forked branch? Annoying, I know but we don't honor skill-validator changes in forks. |
|
Reopened in #273 |
Summary
This PR adds an optional Docker execution mode to
skill-validatorso agent runs, judges, and setup commands can execute in an isolated container instead of directly on the host machine.Motivation
The main use case for this is for local development, but it might also be useful for running in CI if we want to build on top of it. I was building some skills and found that when using some weaker models, they made destructive changes to my host system to accomplish the task (e.g. reinstalling .NET). With this, agents and judges run inside a container with only access to the files they need bound to the host machine. This does not add any additional security measures for network isolation.
Implementation
This makes use of the
--headlessmode for running copilot as described here: https://github.com/github/copilot-sdk/blob/main/docs/guides/setup/backend-services.md.It requires a
GITHUB_TOKENbe present to pass into the container so that it can use that to authenticate to the Copilot API. I have an example in theREADMEwhich explains that you can get this token withgh auth token. For people with multiple gh accounts (e.g. personal and enterprise), you can also dogh auth token --user <name>.A Dockerfile is included in the repo to use as the base image:
This ensures that we use the exact same Copilot CLI binary that is shipped with the SDK. The SDK version is resolved programmatically inside the SkillValidator so it is kept in sync. It places the copilot binary at
/usr/local/bin/copilotinside the container.To handle path mapping/translation, when running in docker mode, all temp/work directories are placed inside a single directory in the TMP folder, and that entire directory is mounted into the container with read-write. This makes it easy to map paths to and from the host and container equivalent when needed. Skill directories are also mounted into the container with read-only access, and only the directories that are being evaluated will be mounted.
The container uses a randomised port
-p 0:4321which is resolved later usingdocker port. The container is always cleaned up after finishing, including onProcessExitandCancelKeyPressevents.Future Extensibility
I have a proof of concept working locally which I chose not to push for now to keep this PR simple which runs all agents inside their own containers rather than having a single container that is used to run all agents and judges. This would help reduce any risks of agents modifying the environment and impacting other evaluations if that sounds desirable, but it does mean that each agent would use a separate
CopilotClientrather a single sharedCopilotClient.